Microtask Crowdsourcing for Disease Mention Annotation in PubMed Abstracts
نویسندگان
چکیده
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402.
منابع مشابه
Worker Viewpoints: Valuable Feedback for Microtask Designers in Crowdsourcing
One of the problems a requester faces when crowdsourcing a microtask is that, due to the underspecifie or ambiguous task description, workers may misinterpret the microtask at hand. We call a set of such interpretations worker viewpoints. In this paper, we argue that assisting requesters to gather a worker’s interpretation of the microtask can help in providing useful feedback to designers, who...
متن کاملCrowd Work CV: Recognition for Micro Work
With an increasing micro-labor supply and a larger available workforce, new microtask platforms have emerged providing an extensive list of marketplaces where microtasks are offered by requesters and completed by crowd workers. The current microtask crowdsourcing infrastructure does not offer the possibility to be recognised for already accomplished and offered work in different microtask platf...
متن کاملCrowdsourcing for bioinformatics
MOTIVATION Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of a...
متن کاملOptimal Posted-Price Mechanism in Microtask Crowdsourcing
Posted-price mechanisms are widely-adopted to decide the price of tasks in popular microtask crowdsourcing. In this paper, we propose a novel postedprice mechanism which not only outperforms existing mechanisms on performance but also avoids their need of a finite price range. The advantages are achieved by converting the pricing problem into a multi-armed bandit problem and designing an optima...
متن کاملUnderstanding Potential MicrotaskWorkers for Paid Crowdsourcing
More and more people leverage the power of crowds to obtain solutions of their problems, and the number of microtask workers also increases rapidly on paid crowdsourcing marketplaces. However, there is an order of magnitude discrepancy between the population of Internet users (≈ 2 billions) and that of microtask workers (≈ 0.5 millions); we believe that a large number of potential workers are i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2015